WO2001027809A2 - Visualisation de relations dans des paquets de donnees - Google Patents

Visualisation de relations dans des paquets de donnees Download PDF

Info

Publication number
WO2001027809A2
WO2001027809A2 PCT/NL2000/000742 NL0000742W WO0127809A2 WO 2001027809 A2 WO2001027809 A2 WO 2001027809A2 NL 0000742 W NL0000742 W NL 0000742W WO 0127809 A2 WO0127809 A2 WO 0127809A2
Authority
WO
WIPO (PCT)
Prior art keywords
arrays
array
targets
virtual
expression
Prior art date
Application number
PCT/NL2000/000742
Other languages
English (en)
Other versions
WO2001027809A3 (fr
Inventor
Wilhelmus Maria Van Der Krieken
Jan Kodde
Original Assignee
Plant Research International B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Plant Research International B.V. filed Critical Plant Research International B.V.
Priority to AU13103/01A priority Critical patent/AU1310301A/en
Publication of WO2001027809A2 publication Critical patent/WO2001027809A2/fr
Publication of WO2001027809A3 publication Critical patent/WO2001027809A3/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Definitions

  • Comparison of data files shows whether there are similarities or differences in the files.
  • the invention described here relates to a technique for comparing two or more data sets by showing the differences and/or similarities between the sets in figures. This technique is for instance important for comparing data from experiments with micro-arrays, although other data sets can also be compared with this technique.
  • Other examples are data sets from: cosmology, mathematics, screening of the population, patient screening, sociological research, physical determinations, biotechnology,
  • DNA micro-arrays Analysis of data files of DNA micro-arrays (and DNA chips) are further elaborated here.
  • Files of DNA- micro-arrays are an example of biotechnological data files (other examples hereof are for instance files of dot blots, cDNA-AFLP, Northern blots, Southern blots and protein-arrays from proteomics) .
  • the use of DNA micro- arrays is growing enormously in molecular genetics. In this technique DNA fragments associated with particular genes are placed as little "spots" on a glass plate (the sequence of the DNA fragments and function of the associated gene can be known after analysis) .
  • This DNA, RNA or cDNA is marked with a particular colour (for instance yellow) and is therefore referred to as probe. After hybridization with the probe the intensity of the colouring is proportional to the (relative) expression level of the gene.
  • Two DNA or RNA probes can also be tested simultaneously.
  • the second probe originates for instance from tissue which has undergone a treatment other than the tissue from which the first probe is made .
  • the second probe can be marked with a different colour (for instance red) .
  • Targets on the micro-array which are yellow or red after the hybridization are only expressed in one of the tissues and targets with an intermediate colour (orange -like colours) are expressed in both tissues.
  • the object of the analysis program described here is to visualize expression data obtained using micro-arrays (or data from other data files) in the form of virtual arrays.
  • the array elements are herein rearranged in a manner such that genes with a comparable expression pattern are grouped together at a meaningful position in the virtual array. Mutual relations hereby become visible in rapid, ordered and clear manner.
  • the targets associated with a treatment with a determined specificity are herein placed together in a figure representing a virtual array.
  • the targets which are expressed in another treatment are also placed together, and overlaps within these groups of genes are also visualized. Relations in gene expression in different treatments hereby become immediately apparent. Genes which are always expressed together and form, as it were, networks of gene expression are also identified. The rearrangement is such that the information about the original position of the targets is saved so that it is possible to retrieve which gene (function and sequence) is specifically expressed.
  • the new location of the target in this virtual array is determined by the relative expression level relative to the expression of the control genes or of genes which are expressed in other treatments. In a single (1 determination/l physiological condition) experiment the results are easy to compare. If however a plurality of mutually related conditions (for instance a time array and/or a concentration array) have to be compared, the location of the target in the virtual array also depends on the result of the other measurements within the inputted array.
  • fig. 1 shows a diagram of a regeneration process in a culture medium as embodiment of the present invention
  • fig. 2 shows a graphic representation of the result of the experiment according to fig. 1
  • fig. 3 is a graphic representation of regrouping of the representation of fig. 2
  • fig. 4 is a graphic representation of the groups of genes derived from fig. 3 which are expressed
  • fig. 5 is a graphic representation of a regrouping obtained according to method 1 below
  • fig. 6 is a graphic representation of a regrouping obtained according to method 2 below.
  • FIG 1 Shown in figure 1 is a typical example of an experiment wherein gene expression, by means of micro- arrays, is studied in relation to different physiological conditions.
  • virtual arrays are made by ordering individual targets on the arrays such that they correspond as much as possible with the physiological incubation conditions in the experiment.
  • the position of a particular target on the virtual array is hereby no longer random but on the contrary correlated to its function at a determined physiological condition in the experiment.
  • Fig. 3 shows the result of an ideal virtual arrangement of the results of fig. 1.
  • the algorithms are capable of (virtually) ordering the results obtained in experiments with micro- arrays.
  • the micro-arrays consist of DNA targets having therein copies of complete genes or parts thereof. In the experiments the expression is measured of the genes which are present on the micro-array.
  • a known mRNA pool, isolated from tissue is converted into labeled cDNA.
  • the labeled cDNAs can bind to complementary DNA in the targets.
  • the expression level of a gene determines its proportion in the mRNA pool.
  • This expression level can thus be measured by determining the quantity of label (of the cDNA) on a particular target .
  • tissues incubated under different conditions are such that they have a determined mutual relation (for instance a concentration array of a determined substance and/or a time array) .
  • the example shows an experiment with two substances (auxin and cytokinin) .
  • the function and relation of the targets can be determined from the obtained virtual arrangement .
  • the data are expression levels which are measured on the micro-arrays (quantity of one or more labels on a target) in the above stated experiments.
  • this data is placed in columns (or rows) .
  • Each column (or row) herein represents the data obtained at a determined incubation condition (determined concentration of auxin or cytokinin) .
  • a row (or column) represents the expression results of a target under all incubation conditions.
  • the expression values are first classified in a limited number of levels. Each level therefore represents an expression value between two preselected limit values. In those targets where the expression value lies below the lowest limit value, the level value zero is assigned (the value zero is assigned to low expression levels) . The final outcome can be optimized by choosing the limit values of the different levels. If down-regulation of genes is also being studied, the zero value can for instance be assigned to another level .
  • the targets in the different columns are clustered. This can be done for instance as follows: A binary code is assigned to each target. One binary number is assigned (0 or 1) to each expression value of a determined column. One of the expression levels, for instance expression level 0, is given the binary number 0 and all other expression levels the value 1. A row with numbers thus results which forms the binary code. All targets with the same binary code are clustered.
  • the arrays representing the different incubation conditions are placed in a determined sequence/position. This placing represents for instance the test set-up. In this case the incubation with the lowest concentration of substance 1 is placed on the left and that with the highest on the right (and at the bottom the lowest concentration of substance 2 and at the top the highest concentration of substance 2 (as is usual in the plotting of figures) . Calculation of the optimal position of the clusters on the virtual arrays .
  • the virtual arrangement of clusters within the arrays can be based on the following. All targets within a cluster have the same, calculated position. If a cluster is expressed at a high concentration of substance 1 and of substance 2, this cluster is then also placed on the right at the top within all virtual arrays. In other words, the "X and Y" coordinates of the cluster (and the targets within this cluster) are dependent on the expression pattern of the clustered targets in the different arrays.
  • targets can be placed on the virtual arrays .
  • the sequence in which clusters are placed can be in order of increasing or decreasing X and/or Y value, in accordance with increasing cluster size, and so on. If during placing the calculated position is already occupied, the closest empty position is chosen. The visualizing will be clearer by beginning with the target which has the highest total expression within a cluster. If during placing of a new cluster the calculated position is occupied, another position is chosen. This position can for instance be the closest position which is still available, although other placing strategies can also be envisaged. All targets within the cluster then acquire the X and Y coordinates of this new position.
  • the virtual arrays For a good overview of the virtual arrays (good delimitation of different clusters) , it can be advantageous to make the virtual arrays larger than the original arrays. In this manner an equal number of virtual and original arrays is generated.
  • the expression values on the virtual arrays are herein visualized by means of colours, grey tones or hatchings which represent the expression level.
  • the virtual target also contains information about the original target (specification of the DNA, location on the original arrays etc) .
  • the clustering can be based on more than two dimensions, the visualizing (placing of the targets and clusters) is two-dimensional and the expression level can be shown as third dimension (via colour or hatching etc) .
  • Method 2 Placing of individual targets in virtual arrays so that relations in gene expression become apparent in relation to the set-up of the performed experiments.
  • This form of arrangement is based on the calculation of an individual X and Y coordinate for each target in the virtual array.
  • This calculation can for instance be done as follows: of each target the expression on a determined array is multiplied by the position of this array relative to the other arrays in the physiological experiment in the X or respectively Y direction (in two-dimensional analysis, in the case of more dimensions calculations are of course performed with proportionally more coordinates) .
  • the expression of a determined target of figure 1 which is placed in the physiological set-up on the left at the bottom is for instance multiplied by 1 to calculate both the x and the y coordinate.
  • the expression of this target in the array to the right thereof is multiplied by 2 for the x coordinate and by 1 for the y coordinate.
  • the expression in the physiological condition hereabove is multiplied by 2 for both the x and the y coordinate. This is calculated for the target for all incubation conditions in the physiological experimen . All outcomes for a determined target are then added together. Finally, this number is divided by the sum of all expression levels of the target in all incubation conditions. This calculation is performed in order to determine the x and the y position of each target on the virtual array. Form of the virtual array.
  • the form of the original array depends on the application onto the glass plate.
  • the form of this array (number of rows and columns) may hereby not match the physiological incubation conditions very well. It may therefore be advantageous to adapt the form of the virtual array to the performed experiments (see for instance figure 1) .
  • the targets are sorted on the basis of these X and Y coordinates.
  • the form (number of columns with targets or number of rows with targets) of the virtual array is taken into account here. If for instance the number of columns (equals the width of the array) is taken into account, the targets are first sorted on the basis of increasing or decreasing y coordinates. The sorted targets are then subdivided into groups the size of the array width. Sorting within these groups subsequently takes place on the basis of increasing or decreasing y coordinate.
  • the sorted targets are placed in the virtual array row by row or column by column.
  • the starting position (which of the four corners) and manner of placing (rows or columns) depends on the sequence (the x or y coordinate first) and method (increasing or decreasing) of sorting. If a start is made with the x coordinate, the rearrangement is then particularly optimized, with the algorithm used here, for the variable which is plotted on the x-axis. If a start is made with the y coordinate, rearrangement is then particularly optimized for the variable which is plotted on the y- axis. With other sorting and placing methods this may be different again. In all cases a skilled researcher can analyse the data quickly and in orderly and reliable manner.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

La présente invention concerne un procédé de comparaison et/ou d'analyse de fichiers de données obtenus par exemple dans une ou plusieurs formes de dispositifs, ou dans une forme matricielle. Les données et relations des dispositifs ou matrices avec corrélations mutuelles sont réarrangées dans une matrice virtuelle, ces relations ou corrélations mutuelles pouvant être rendues visibles aisément pour un utilisateur, par exemple sur un écran.
PCT/NL2000/000742 1999-10-15 2000-10-16 Visualisation de relations dans des paquets de donnees WO2001027809A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU13103/01A AU1310301A (en) 1999-10-15 2000-10-16 Visualizing relations in data sets

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NL1013297 1999-10-15
NL1013297A NL1013297C1 (nl) 1999-10-15 1999-10-15 Visualisering van verbanden in datasets.

Publications (2)

Publication Number Publication Date
WO2001027809A2 true WO2001027809A2 (fr) 2001-04-19
WO2001027809A3 WO2001027809A3 (fr) 2002-09-12

Family

ID=19770056

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2000/000742 WO2001027809A2 (fr) 1999-10-15 2000-10-16 Visualisation de relations dans des paquets de donnees

Country Status (3)

Country Link
AU (1) AU1310301A (fr)
NL (1) NL1013297C1 (fr)
WO (1) WO2001027809A2 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1388801A2 (fr) * 2002-08-08 2004-02-11 Agilent Technologies, Inc. Méthodes et système pour la visualisation et la manipulation simultanées de données de types multiples
EP1524612A2 (fr) * 2003-10-18 2005-04-20 Agilent Technologies, Inc. Affichage et manipulation de données
US6950756B2 (en) * 2003-02-05 2005-09-27 Agilent Technologies, Inc. Rearrangement of microarray scan images to form virtual arrays
US7353116B2 (en) 2003-07-31 2008-04-01 Agilent Technologies, Inc. Chemical array with test dependent signal reading or processing
US7825929B2 (en) 2003-04-04 2010-11-02 Agilent Technologies, Inc. Systems, tools and methods for focus and context viewing of large collections of graphs

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EISEN M B ET AL: "Cluster analysis and display of genome-wide expression patterns" PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, NATIONAL ACADEMY OF SCIENCE. WASHINGTON, US, vol. 95, December 1998 (1998-12), pages 14863-14868, XP002140966 ISSN: 0027-8424 *
WEINSTEIN JOHN N ET AL: "An information-intensive approach to the molecular pharmacology of cancer." SCIENCE (WASHINGTON D C), vol. 275, no. 5298, 1997, pages 343-349, XP002199806 ISSN: 0036-8075 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1388801A2 (fr) * 2002-08-08 2004-02-11 Agilent Technologies, Inc. Méthodes et système pour la visualisation et la manipulation simultanées de données de types multiples
EP1388801A3 (fr) * 2002-08-08 2006-02-22 Agilent Technologies, Inc. Méthodes et système pour la visualisation et la manipulation simultanées de données de types multiples
US8131471B2 (en) 2002-08-08 2012-03-06 Agilent Technologies, Inc. Methods and system for simultaneous visualization and manipulation of multiple data types
US6950756B2 (en) * 2003-02-05 2005-09-27 Agilent Technologies, Inc. Rearrangement of microarray scan images to form virtual arrays
US7825929B2 (en) 2003-04-04 2010-11-02 Agilent Technologies, Inc. Systems, tools and methods for focus and context viewing of large collections of graphs
US7353116B2 (en) 2003-07-31 2008-04-01 Agilent Technologies, Inc. Chemical array with test dependent signal reading or processing
EP1524612A2 (fr) * 2003-10-18 2005-04-20 Agilent Technologies, Inc. Affichage et manipulation de données
EP1524612A3 (fr) * 2003-10-18 2006-09-06 Agilent Technologies, Inc. Affichage et manipulation de données

Also Published As

Publication number Publication date
AU1310301A (en) 2001-04-23
NL1013297C1 (nl) 2001-04-18
WO2001027809A3 (fr) 2002-09-12

Similar Documents

Publication Publication Date Title
US6349144B1 (en) Automated DNA array segmentation and analysis
US7317820B2 (en) System and method for automatically identifying sub-grids in a microarray
US6633659B1 (en) System and method for automatically analyzing gene expression spots in a microarray
US6829376B2 (en) Computer software system, method, and product for scanned image alignment
US20030182066A1 (en) Method and processing gene expression data, and processing programs
US6699659B2 (en) Products and methods for analyzing nucleic acids including identification of substitutions, insertions and deletions
US20030087289A1 (en) Image analysis of high-density synthetic DNA microarrays
WO2001027809A2 (fr) Visualisation de relations dans des paquets de donnees
WO2013171565A2 (fr) Procédé et système pour évaluer des molécules dans des échantillons biologiques en utilisant des images dérivées de micropuce
AU2004234996B2 (en) Array having substances fixed on support arranged with chromosomal order or sequence position information added thereto, process for producing the same, analytical system using the array and use of these
US20040181342A1 (en) System and method for automatically analyzing gene expression spots in a microarray
WO2001020998A1 (fr) Identification de medicaments au moyen d'un profilage de l'expression genique
EP1691311A1 (fr) Procédé, système et logiciel pour effectuer des interprétations biologiques d'expériences en microréseau
US20080123898A1 (en) System and Method for Automatically Analyzing Gene Expression Spots in a Microarray
EP1422514B1 (fr) Methode d'evaluation de l'uniformite des taches dans des reseaux
JP4266575B2 (ja) 遺伝子発現データの処理方法および処理プログラム
Kellam et al. Experimental use of DNA arrays
KR100437253B1 (ko) 마이크로어레이 모사 이미지 생성 시스템 및 그 방법
Ali et al. Developmental biology: an array of new possibilities
Wanke et al. The Analysis of Gene Expression and Cis-Regulatory Elements in Large Microarray Expression Datasets
JP2005518008A (ja) 遺伝子発現データを用いた遺伝子機能推定
KR20080013099A (ko) 마이크로어레이 이미지 분할 방법
Doumas et al. DNA microarrays in Plants
Xiao-yan et al. Experimental genomics: The application of DNA microarrays in cellular and molecular biology studies
Baans et al. Analysis of Normalization Method for DNA Microarray Data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWW Wipo information: withdrawn in national office

Ref document number: 2000974995

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2000974995

Country of ref document: EP

AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP