WO2013076665A1 - Interactive system for display and analysis of biomolecular data - Google Patents

Interactive system for display and analysis of biomolecular data Download PDF

Info

Publication number
WO2013076665A1
WO2013076665A1 PCT/IB2012/056598 IB2012056598W WO2013076665A1 WO 2013076665 A1 WO2013076665 A1 WO 2013076665A1 IB 2012056598 W IB2012056598 W IB 2012056598W WO 2013076665 A1 WO2013076665 A1 WO 2013076665A1
Authority
WO
WIPO (PCT)
Prior art keywords
tridimensional
graphical elements
displayed
graphical
electronic system
Prior art date
Application number
PCT/IB2012/056598
Other languages
French (fr)
Inventor
Gianluca VEZZADINI
Riccardo CORSI
Alessandro PIGLIA
Original Assignee
Kairos3D S.R.L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kairos3D S.R.L. filed Critical Kairos3D S.R.L.
Publication of WO2013076665A1 publication Critical patent/WO2013076665A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Definitions

  • the present invention is within the field of bioinformatics and relates to an interactive system for display and analysis of biomolecular data, in particular of genomic data, proteomic data and of biological and/or clinical and/or environmental data related thereto.
  • the present invention concerns an electronic system configured to display biomolecular data, to perform analysis and processing, such as statistical analysis and processing, of the displayed biomolecular data, and display the results of the performed analysis and processing.
  • sample refers to the biological material on which analysis are performed, typically molecular determinations.
  • a sample may, for example, be a fragment of a tumor tissue of a patient, in which case in this document is also used the terminology “sample/patient”. ,2.
  • Gene It is known that genes are the basic functional units of the genome of living organisms. Each gene corresponds to a sequence of nucleic acids, in particular of DNA (deoxyribonucleic acid) or, more rarely, of RNA (ribonucleic acid) , delimited by a starting coordinate and an end coordinate, that can be mapped on the genome of the species to which that gene references to.
  • miRNA As it is known, the miRNA (or microRNA) are small molecules of RNA, single-stranded nucleotides of 20-22, produced by the cells by transcription from genomic DNA. They have therefore, like genes, precise coordinates on the genome.
  • gene is used to refer to both genes in the strict sense and miRNA, unless the latter are explicitly mentioned and/or distinguished.
  • expression means a measurement of the abundance (or amount) in a test sample of RNA relative to a given gene or miRNA.
  • Data matrix In this document, the term “data matrix” is used to mean a matrix or table where rows represent genes and columns represent samples. Each cell of the matrix contains, therefore, a value related to a certain gene measured in a given sample.
  • dataset refers to a collection of data related to molecular analysis (typically multiple analysis) and various annotations related to the same series of samples.
  • a dataset typically comprises multiple tables.
  • group means a set of objects (genes, samples, etc.) grouped on the basis of characteristics that unite them, or set them apart .
  • Cluster means a group created by appropriate computational algorithms, called “clustering algorithms”, that group items based on their similarities calculated according to functions of choice (for example, Pearson correlation, Euclidean distance, etc.) .
  • 3D graphics means the so-called 3D computer graphics that, as is known, is a technology for creating and displaying, via computer, static or moving images of objects on the basis of mathematical three-dimensional models of the objects to be represented, wherein these three-dimensional mathematical models are generated and processed by an electronic computer (a computer) .
  • Metadata means a value calculated on the basis of several elements (for example genes, samples, etc.) using functions of choice.
  • a “metasample” is a virtual sample resulting from the aggregation of samples belonging to a particular group, for which the average of the measurements is calculated for one or more genes of interest.
  • Types of examples of metadata are the mean, the standard deviation, the p-value of a statistical test, etc..
  • the measured data on tissue samples tend to be a very large volume, arriving in practice to many thousands of values for each measurement, and can even grow in theory up to billions of values.
  • several measures are usually carried out, thus obtaining more vectors of thousands of values for each sample .
  • the analysis of this data type envisages a comparison of a large number of samples, so that the set of information to be processed are often in the form of matrices, in which each row indicates a specific value (usually in relation to a given gene) and each column represents the complete data of a sample.
  • a matrix of this type has many thousands of rows (individual genes) and usually hundreds of columns (the samples that are used in the case study) .
  • This object is achieved by the present invention in that it relates to an electronic system for the display and analysis of biomolecular data, in particular of genomic data, of proteomic data and of biological and/or clinical and/or environmental data related thereto, as defined in the attached claims.
  • Figure 1 schematically illustrates an electronic system for the display and analysis of biomolecular data according to a preferred embodiment of the present invention
  • Figure 2 shows an example of a three-dimensional virtual environment with three walls that can be displayed by the system shown in Figure 1;
  • Figures 3-5 show examples of three-dimensional Genomic Matrices at different levels of detail that can be displayed by the system shown in Figure 1;
  • Figure 7 shows a traditional graphical representation of the Chromosome 12
  • Figure 8 shows an example of a three-dimensional Object Map that can be displayed by the system shown in Figure 1 ;
  • Figure 9 shows an example of a three-dimensional Group Map that can be displayed by the system shown in Figure 1.
  • the present invention will be described, for simplicity, referring explicitly to genomic data and, where considered appropriate, to medical and/or clinical data, but without losing generality because of this.
  • the present invention may be advantageously used for the display and analysis, in general, of biomolecular data, and, in particular, of genomic data, of proteomic data and of biological and/or clinical and/or environmental data related thereto.
  • the present invention relates to an interactive electronic system for the display and analysis of biomolecular data.
  • the present invention is achieved by means of an electronic system for processing, calculating and displaying, such as a computer, specifically programmed, i.e., programmed by means of a specific software program such that to cause that, when executed by the electronic system for processing, calculating and displaying, the latter implements specific functionalities of display and analysis of biomolecular data according to the present invention which will be described in detail in the following.
  • Figure 1 a block diagram representing an example architecture of an electronic system (indicated as a whole by 1) for the display and analysis of biomolecular data according to a preferred embodiment of the present invention.
  • the electronic system 1 includes:
  • ⁇ display means 12 for example a screen, operable to implement the specific functionalities of display and analysis of biomolecular data according to the present invention as will be described in detail hereinafter;
  • user input means 13 such as a mouse and/or a keyboard, configured to allow a user to interact, in use, with the system 1 in relation to the implementation of the specific functionalities of display and analysis of biomolecular data according to the present invention as will be described in detail hereinafter;
  • input/output means 14 such as a USB (Universal Serial Bus) port and/or an Ethernet port and/or a modem/router and/or a CD/DVD player/writer, configured to receive input data to be used for the implementation of the specific functionalities of display and analysis of biomolecular data according to the present invention as will be described in detail in the following, and/or export output data obtained through the implementation of the specific functionalities of display and analysis of biomolecular data according to the present invention as will be described in detail in the following, and
  • computing and processing means 15 for example one or more processors (CPUs) , that are
  • the present invention uses three-dimensional (3D) computer graphics technologies to build a virtual environment that represents in a symbolic way the input data.
  • a user can surf in this virtual environment (3D) and interact with the different elements, which are subsets of the input dataset .
  • the user interaction allows the building of analysis paths, using statistical tools for the comparison of data groups in order to highlight correlations .
  • the 3D virtual environment is interactive and can be freely surfed, examining various elements as needed.
  • the interaction also allows to select one or more objects, examine related detailed information and activate analysis paths starting from the selected items.
  • the 3D virtual environment has a number of advanced surfings and interactions features (described in detail in the following paragraphs and sections) , which are intended to provide the user with simple and effective tools for data analysis .
  • the present invention allows to create a 3D virtual environment that includes a number of 2walls", i.e. three-dimensional (3D) individual graphical objects representing a part of the available information.
  • 3D three-dimensional
  • the user can control various display parameters and can interact with the graphical interface in order to perform wall-specific analyses.
  • 3D walls that the system 1 can display (which will be described more in detail in the following) are provided:
  • Genomic Matrix for displaying data matrixes
  • Chromosome Map for displaying data in the context of a graphical representation of the chromosomes
  • Object Map for contextual representation of individual elements (e.g., patients, tissue samples, gene clusters) and their groupings;
  • Group Map for the representation of groups of elements with loss of information about individual elements .
  • figure 2 is shown an example of a three-walls 3D virtual environment that can be displayed by the system 1.
  • the system 1 allows to change the resolution of the 3D virtual environment to show different levels of detail as a consequence of both the overall volume of data, and of which portion of the environment is to be viewed; the dynamically changing resolution allows both an effective display in real time of very large dataset and detailed analysis of portions of the dataset.
  • the selection of the resolution level can be done automatically by the system 1 or manually set by the user.
  • a user can set the reference range of the values to be represented by defining a minimum, a maximum and a center value to be considered as a reference (which is not necessarily the average value between the minimum and maximum) .
  • a reference which is not necessarily the average value between the minimum and maximum
  • a set of groups that include the fields of the matrix (rows or columns, that is, the genes or samples) ; for example you can define a set of groups of rows , where each group contains one or more genes. This set of groups is complete (contains all genes or all the samples) and is exclusive (the same item cannot be in two different groups) . Furthermore, you can define a hierarchy of clusters: given a set of groups, each of these can in turn have sub-groups, thereby forming a tree whose leaves are the individual fields (genes or samples) . Each level of the hierarchy contains a complete and exclusive set of groups. The hierarchical structure is used in combination with the dynamically changing resolution to vary the displayed level of detail. For example an overall view of the genomic matrix can show only clusters of higher level; browsing and approaching to the individual groups can show elements of the lowest hierarchical level up to the single- cell of the data matrix.
  • the 3D virtual environment can include a reference
  • splitting plane in correspondence with the reference value (central value) of a wall; this plan allows to better distinguish the positive values (above the reference value) from the negative ones.
  • a user can control the level of transparency of the plan for optimal viewing of the positive/negative data.
  • both positive values and negative values are represented in the virtual 3D environment, drawn in a different way with respect to a reference plane that corresponds to the central value, it may be necessary sometimes to also see elements that are located below the splitting plane. Instead of navigating and move down to see them, the user can simply reverse the representation of positive and negative values, thus moving the positive ones under the splitting plane and vice versa. This inversion option, or flip, is immediate and therefore faster than explicit surfing.
  • the system 1 allows to perform statistical analysis on the represented data.
  • the main options are summarized below.
  • a You can have two walls with groups of genes in which the genes are grouped according to different criteria. By selecting a group of genes in a wall an enrichment calculation can be activated in the other wall, according to which the system 1 displays in which groups of this wall the presence of genes among those selected is significantly lower or higher than that expected by that case.
  • the activated calculation in this case will be a hypergeometric function that establishes the probability of random representation of the selected objects, and an observed/expected ration that indicates how much the observed frequency is higher or lower compared to a random distribution of selected objects.
  • the system 1 internally adopts a proprietary model optimized for 3D visualization and interaction, but it can be interfaced to existing databases via standard formats and conversion routines. 'Below are listed the main features of the system 1 with regard to data management.
  • Non-public data interface It is foreseen that the system 1 could also manage, through the addition of dedicated extensions, a data input structured in a specific way and non compatible with the public data banks.
  • the graphical user interface can be used to reorganize the data and the clusters. Often the use of mathematical algorithms for the generation of clusters does not give satisfactory results. The user can then, by manually selecting and displacing the elements of interest, generate groups with real time control of their homogeneity level. The dataset reorganized in this way can be directly used for analysis or exported as files for archiving. 3. Detailed description of walls
  • the Genomic Matrix includes a set of tridimensional objects that represent the values of the data matrix. Every single matrix cell, identified by a row index and a column index, corresponds to a base 3D element, represented as a parallelepiped which position depends directly on the row and column indices. The base of each parallelepiped is fixed and it does not change; the height instead is dynamic and it is linked to the value of that matrix cell .
  • a second data matrix homogeneous to the first one for number of rows and columns, is represented by linking its values to the colors of each parallelepiped. This way every single 3D object carries both topological information (its position identifies the row and the column in the matrix, so the gene it refers to and the sample it belongs to) and dynamic information (the value of two different measurements related to that gene in that sample) .
  • Figures 3-5 show some examples of genomic matrices at different levels of details that can be displayed by system 1.
  • the figures 3-5 show, respectively, a first genomic matrix in which the higher-level clusters are displayed, a second genomic matrix in which some lower- level clusters and some base elements are displayed, and a third genomic matrix in which the single items are displayed.
  • the entire data matrix is considered as a rectangle laid out horizontally, which size is automatically computed based on the number of rows and columns in the matrix itself. Every base element (every parallelepiped) is linked to the matrix cell it refers to and its position is uniquely defined in the rectangle that represents the entire data matrix.
  • the genomic matrix can be associated to a hierarchical clustering structure for the rows (the genes) and to one for the columns (the samples) .
  • the obtained clusters are represented as bigger parallelepiped, which size equal the sum of their elements. For example, a group of ten rows will be displayed as a parallelepiped ten times bigger than the base one and its position will be defined so to include all its elements.
  • the order by which rows and columns are placed on the base rectangle may be defined by the cluster hierarchy. If a column clustering hierarchy is present, columns will be organized so that, moving from one hierarchy level to the next one, the position will not change (a parallelepiped therefore is decomposed in its elements but their footprint on the base rectangle does not vary) ; if no column clustering is present, columns are organized according to the data loading order.
  • the Chromosome Map provides a visual data organization based on the traditional representation of the genome with its base elements, which are chromosomes, arms, bands and sub-bands.
  • Figure 6 shows a standard Chromosome Map
  • Figure 7 shows a traditional graphical representation of Chromosome 12, in which are displayed the short arm (on the left) and the long arm (on the right) , the respective bands (integers) and the respective sub- bands (decimal digits) .
  • the Object Map is a wall that provides a 3D representation of one of the dimensions of the data matrix, that is of the rows or of the columns, with the related hierarchy of groups if present. In this map, thus, the single matrix cells are not visible anymore, but instead the fields of one of the two dimensions are displayed.
  • the generated 3D model reproduces in a clear way the hierarchical structure and allows to precisely identify and select any single element, be it an object (for example a gene) or a group at any hierarchy level.
  • the Object Map is used for the analysis of single objects or of groups of objects, as it allows the simultaneous visualization of values related to single objects and of metadata computed on groups of objects, eventually at different levels in the hierarchy.
  • the Object Map directly influences the other walls included in the 3D environment, as the selection of one or more elements determines which calculations must be executed and on which data (as described in paragraph 1.2 about analysis features) . For example, it is possible that by means of processing the selection and the analysis of an element in a first Object Map representing active genes impact on a second Object Map showing samples, in particular bringing to compute and assign new values to the elements in said second Object Map.
  • a user can control which statistical operation (for example average, standard deviation, p-value, etc..) must be used in the calculations that determine the metadata of another wall.
  • statistical operation for example average, standard deviation, p-value, etc..
  • Figure 8 shows an example Object Map, in particular an example of a 3D wall with Object Map that can be displayed by system 1.
  • the wall with an Object Map appears as a pyramidal structure, in which every layer corresponds to a level in the hierarchy of groups and when the higher level shows the single elements. Every object (the single gene or the single sample) is represented as a parallelepiped with the base having a predefined size.
  • every group is represented as a parallelepiped with a base which dimension is defined by the set of bases of the contained elements, while the position is assigned by means of a strategy derived from the so-called "rectangle packing" algorithms.
  • the surface occupied by every group is the least possible one needed to include all its elements.
  • Every element is laid out above the parallelepiped that represents its parent group.
  • the resulting 3D wall has a base which represents the hierarchy root (which contains all the elements) and the higher layer shows the parallelepipeds of the single objects (genes or samples) .
  • Single genes are often linked to additional information that consists of relations among groups of genes (for example the miRNA target genes) ; these are non-exclusive groups (which means the same gene can be in more than one group) .
  • the Group Map generates a wall on which these groups are represented.
  • Every group is displayed as a square with variable color, size and transparency. These parameters are used to represent group information as follows:
  • the color is computed based on an intrinsic group value (static, defined as a parameter of the group itself) , or based on the active selection on samples; the computation used for the color is equivalent to the one described for matrix data in paragraph 4.1 about color assignment ;
  • the system in use, builds a graphical virtual environment with well-defined display rules so that, after a learning period, the association between given graphical elements and the variables represented by them is intuitive and immediately understandable . In this way even a very complex set of data can be quickly and efficiently displayed and analyzed.
  • the base graphical elements are simple shapes, squared (square, rectangle, parallelepiped) or rounded (circle, cylinder) , which characteristics may simultaneously encode different types of data as specified here below.
  • the color of a graphical element is defined using a range of reference values and comparing the value to be represented with that range.
  • the range is defined by a minimum value, a specific reference values (called for simplicity "central value") and a maximum value. If the value to be represented is greater than the central value, the color will be computed as an interpolation between the central color and the maximum color; if, instead, the value is less than the central value, the interpolation is done among the central color and the minimum color.
  • a user can set both the reference values and the corresponding colors for the minimum, the maximum and the center of the range; alternatively, predefined colors and reference values can be used (for example, dark grey for the center, red for values greater than the center, green for values less than the center, with values between -1 and +1 and centered at 0) .
  • ⁇ maximum value equal to +3 and represented by red color (expressed in RGB format as 255, 0, 0).
  • the 1.5 value is in the positive range, between the central value and the maximum value, more precisely in this case it is exactly at the half of the positive sub-range.
  • the color representing 1.5 will then be equal to half of the maximum color, that is equal, in RGB format, to 127, 0, 0.
  • the definition of the range of values clamps any value that is outside that range; in the example described above, a value of -3.5 is considered as -3 and, therefore, it corresponds to the color assigned to the minimum value, which is green.
  • the height of an element is directly defined by the value it represents; a user has a control element to edit the amplification of such value, so to augment or reduce as desired the global aspect.
  • Width instead has different meanings depending on the tridimensional wall.
  • the base width directly indicates the number of included matrix cells. For the smallest parallelepipeds it is a single cell, for the cluster ones it is a higher number (indicating how many elements are contained in the cluster) .
  • the single genes are displayed as transversal lines/bands in the context of the chromosome they belong to, compatibly with what the display resolution allows to visualize at a given level of zoom.
  • Bands can raise and generate parallelepipeds which color and height represents different measurements done on the same gene; for example, typically color is used to represent gene expression and height to represent the copy number for the given gene .
  • the system 1 can conveniently use a series of particular graphical effects to represent status information of an element.
  • a blinking color helps highlighting an active or selected object. For example, on the walls displaying an
  • the system 1 highlights the selected objects with a semi-transparent box with a particular color that blinks. It is also possible to assign different colors to different selection types (for example, in the case of a differential selection, items in the primary selection are highlighted with a color that is different from the one used for those in the secondary selection) .
  • System 1 can also conveniently use transparency effects.
  • a color is used to display an object value, its transparency indicates the statistical significance, where meaningful. Objects that are more transparent and, thus, less visible, indicate less statistically significant values, while more opaque and solid objects highlight the most relevant values.
  • system 1 can conveniently use also graphical effects associated to border color and thickness of the displayed elements. For example, for elements with no height (as in the case of the Group Map) it is possible to associate a symbolic meaning also to the object border, using both color and thickness as dynamic values.
  • this kind of graphical metaphor is used by system 1 to represent values with coarse dynamic variations, preferable of "none/all” kind, or distributed over a limited number of categories, as for examples those clinical variables associated to patients the data samples derive from (response/non- response to a given drug, recurrence/non- recurrence of illness, etc..) .
  • Border thickness can also represent in a very effective way the statistical significance of a value, for which the p-value will be inversely proportional to the thickness. In this case, whatever the border color is, if the value is not relevant the border will be very thin and therefore not visible.
  • the system for biomolecular data display and analysis as defined by this invention overcomes the technical limitations associated with the currently known systems because it allows to visualize great amounts of biomolecular data, in particular genomic data, in a way that is easily readable for a user, to analyze and process great amounts of biomolecular data, in particular genomic data, in an interactive and extremely fast way, and to display the analysis and processing results in real time and in a way that is easily readable for a user.
  • the system defined by this invention thanks to the graphical functionalities described above, allows to simultaneously display heterogeneous data (that is, data of different types) for the same objects, in general thanks to the use of different graphical metaphors and, in particular, thanks to the use of different colors, multiple dimensions and different particular graphical effects.
  • the system defined by this invention makes dramatically faster and more efficient for a user the process of integrative analysis of the data and of the processing results.

Abstract

The present invention concerns an electronic system (1) for biomolecular data display and analysis, configured to: display a tridimensional virtual environment including one or more tridimensional walls, wherein each displayed tridimensional wall includes respective tridimensional graphical elements and represents biomolecular data by means of the respective tridimensional graphical elements; and allow a user to surf in the displayed tridimensional virtual environment, modify the display resolution of the displayed tridimensional virtual environment, select one of more tridimensional graphical elements of one or more displayed tridimensional walls, and activate one of more predefined analysis and/or processing functions for one or more selected tridimensional graphical elements. Moreover, the electronic system (1) is further configured to: modify interactively and in real time the displayed tridimensional virtual environment depending on the user's surfing and in response to a user' s modification of the display resolution; and, in response to the user's activation of one of more predefined analysis and/or processing functions for one or more selected tridimensional graphical elements of one or more displayed tridimensional walls, apply said predefined analysis and/or processing function (s) to the biomolecular data represented by said selected tridimensional graphical element (s), and modify interactively and in real time the selected tridimensional graphical element (s) depending on one or more results obtained by applying said predefined analysis and/or processing function (s).

Description

INTERACTIVE SYSTEM FOR DISPLAY AND ANALYSIS OF BIOMOLECULAR
DATA
TECHNICAL FIELD OF THE INVENTION
The present invention is within the field of bioinformatics and relates to an interactive system for display and analysis of biomolecular data, in particular of genomic data, proteomic data and of biological and/or clinical and/or environmental data related thereto.
Specifically, the present invention concerns an electronic system configured to display biomolecular data, to perform analysis and processing, such as statistical analysis and processing, of the displayed biomolecular data, and display the results of the performed analysis and processing.
DEFINITIONS
The meaning associated with some specific terms used in this document is provided hereinafter.
1. "Sam le" - In the present document the term
"sample" refers to the biological material on which analysis are performed, typically molecular determinations. A sample may, for example, be a fragment of a tumor tissue of a patient, in which case in this document is also used the terminology "sample/patient". ,2. "Gene" - It is known that genes are the basic functional units of the genome of living organisms. Each gene corresponds to a sequence of nucleic acids, in particular of DNA (deoxyribonucleic acid) or, more rarely, of RNA (ribonucleic acid) , delimited by a starting coordinate and an end coordinate, that can be mapped on the genome of the species to which that gene references to.
3. "miRNA" - As it is known, the miRNA (or microRNA) are small molecules of RNA, single-stranded nucleotides of 20-22, produced by the cells by transcription from genomic DNA. They have therefore, like genes, precise coordinates on the genome. In this document, the term "gene" is used to refer to both genes in the strict sense and miRNA, unless the latter are explicitly mentioned and/or distinguished.
4. "Expression" - In the present document the term
"expression" means a measurement of the abundance (or amount) in a test sample of RNA relative to a given gene or miRNA.
5. "Data matrix" - In this document, the term "data matrix" is used to mean a matrix or table where rows represent genes and columns represent samples. Each cell of the matrix contains, therefore, a value related to a certain gene measured in a given sample.
6. "Dataset" - In this document, the term "dataset" refers to a collection of data related to molecular analysis (typically multiple analysis) and various annotations related to the same series of samples. A dataset typically comprises multiple tables.
7. "Group" - In this document, the term "group" means a set of objects (genes, samples, etc.) grouped on the basis of characteristics that unite them, or set them apart .
8. "Cluster" - In the present document the term "cluster" means a group created by appropriate computational algorithms, called "clustering algorithms", that group items based on their similarities calculated according to functions of choice (for example, Pearson correlation, Euclidean distance, etc.) .
9. "3D graphics" - In this document, the term "3D graphics" means the so-called 3D computer graphics that, as is known, is a technology for creating and displaying, via computer, static or moving images of objects on the basis of mathematical three-dimensional models of the objects to be represented, wherein these three-dimensional mathematical models are generated and processed by an electronic computer (a computer) .
10. "Metadata" - In this document the term "metadata" means a value calculated on the basis of several elements (for example genes, samples, etc.) using functions of choice. For example a "metasample" is a virtual sample resulting from the aggregation of samples belonging to a particular group, for which the average of the measurements is calculated for one or more genes of interest. Types of examples of metadata are the mean, the standard deviation, the p-value of a statistical test, etc..
STATE OF THE ART
As known, in the field of genomics, the measured data on tissue samples tend to be a very large volume, arriving in practice to many thousands of values for each measurement, and can even grow in theory up to billions of values. In addition, for each sample several measures are usually carried out, thus obtaining more vectors of thousands of values for each sample .
The analysis of this data type envisages a comparison of a large number of samples, so that the set of information to be processed are often in the form of matrices, in which each row indicates a specific value (usually in relation to a given gene) and each column represents the complete data of a sample. A matrix of this type has many thousands of rows (individual genes) and usually hundreds of columns (the samples that are used in the case study) .
The tools currently available to bioinformaticians allow the display and analysis of these data, forcing, however, the user to work always with partial views because of the amount of information. These systems usually involve very time-consuming procedures for the information treatment and processing, so, for a complete analysis of the information, the use of many tools in sequence is often necessary.
OBJECT AND SUMMARY OF THE INVENTION
In consideration of the aforementioned technical limitations associated with the currently known systems for the display and analysis of genomic data, the Applicant has felt the need to develop an innovative system for display and analysis of genomic data capable, in general, of overcoming these technical limitations and, in particular, of displaying large amounts of genomic data in an easily intelligible way for a user, of analyzing and processing large amounts of genomic data in an interactive and extremely fast way, and of displaying the results of the analysis and of the processing in real time and in an easily intelligible way for a user.
Therefore, it is an object of the present invention to provide a system for the display and analysis of genomic data of that aforesaid type.
This object is achieved by the present invention in that it relates to an electronic system for the display and analysis of biomolecular data, in particular of genomic data, of proteomic data and of biological and/or clinical and/or environmental data related thereto, as defined in the attached claims.
BRIEF DESCRIPTION OF THE DRAWINGS
For a better understanding of the present invention, some preferred embodiments, which are intended purely by way of example and are not to be construed as limiting, will now be described with reference to the attached drawings (not to scale) , wherein:
• Figure 1 schematically illustrates an electronic system for the display and analysis of biomolecular data according to a preferred embodiment of the present invention;
Figure 2 shows an example of a three-dimensional virtual environment with three walls that can be displayed by the system shown in Figure 1;
• Figures 3-5 show examples of three-dimensional Genomic Matrices at different levels of detail that can be displayed by the system shown in Figure 1;
• Figure 6 shows a standard Chromosome Map;
Figure 7 shows a traditional graphical representation of the Chromosome 12;
· Figure 8 shows an example of a three-dimensional Object Map that can be displayed by the system shown in Figure 1 ; and
• Figure 9 shows an example of a three-dimensional Group Map that can be displayed by the system shown in Figure 1.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION
The following description is presented to enable a person skilled in the art to make and use the present invention. Various modifications to the embodiments will be readily apparent to those skilled in the art, without departing from the scope of the present invention as claimed. Thus, the present invention is not intended to be limited to the embodiments shown and described, but shall be accorded the widest protective scope consistent with the principles and features described and claimed herein.
In particular, hereinafter the present invention will be described, for simplicity, referring explicitly to genomic data and, where considered appropriate, to medical and/or clinical data, but without losing generality because of this. In fact, it is important to stress once again the fact that the present invention may be advantageously used for the display and analysis, in general, of biomolecular data, and, in particular, of genomic data, of proteomic data and of biological and/or clinical and/or environmental data related thereto.
As previously described, the present invention relates to an interactive electronic system for the display and analysis of biomolecular data.
Conveniently, the present invention is achieved by means of an electronic system for processing, calculating and displaying, such as a computer, specifically programmed, i.e., programmed by means of a specific software program such that to cause that, when executed by the electronic system for processing, calculating and displaying, the latter implements specific functionalities of display and analysis of biomolecular data according to the present invention which will be described in detail in the following.
In this connection, in Figure 1 is shown a block diagram representing an example architecture of an electronic system (indicated as a whole by 1) for the display and analysis of biomolecular data according to a preferred embodiment of the present invention.
In particular, as shown in Figure 1, the electronic system 1 includes:
storage means 11 configured to store the aforementioned specific software program;
· display means 12, for example a screen, operable to implement the specific functionalities of display and analysis of biomolecular data according to the present invention as will be described in detail hereinafter;
user input means 13, such as a mouse and/or a keyboard, configured to allow a user to interact, in use, with the system 1 in relation to the implementation of the specific functionalities of display and analysis of biomolecular data according to the present invention as will be described in detail hereinafter;
• input/output means 14, such as a USB (Universal Serial Bus) port and/or an Ethernet port and/or a modem/router and/or a CD/DVD player/writer, configured to receive input data to be used for the implementation of the specific functionalities of display and analysis of biomolecular data according to the present invention as will be described in detail in the following, and/or export output data obtained through the implementation of the specific functionalities of display and analysis of biomolecular data according to the present invention as will be described in detail in the following, and
• computing and processing means 15, for example one or more processors (CPUs) , that are
coupled, for example through one or more communication buses (not shown in Figure 1 for the sake of illustration simplicity) , with the storage means 11, the display means 12, the user input means 13 and to the input/output means 14 to exchange data and
operable to execute the specific software program stored on the storage means 11 to implement the specific functionalities of display and analysis of biomolecular data according to the present invention (appropriately operating the display means 12, receiving interactive commands of a user from the user input means 13 and receiving and/or providing input/output data from/to the input/output means 14) as will be described in detail hereinafter.
1. The basic concepts of the present invention
The present invention uses three-dimensional (3D) computer graphics technologies to build a virtual environment that represents in a symbolic way the input data. A user can surf in this virtual environment (3D) and interact with the different elements, which are subsets of the input dataset . The user interaction allows the building of analysis paths, using statistical tools for the comparison of data groups in order to highlight correlations .
1.1 Interactive real-time three-dimensional graphical layout
According to the present invention, the 3D virtual environment is interactive and can be freely surfed, examining various elements as needed. The interaction also allows to select one or more objects, examine related detailed information and activate analysis paths starting from the selected items.
The 3D virtual environment has a number of advanced surfings and interactions features (described in detail in the following paragraphs and sections) , which are intended to provide the user with simple and effective tools for data analysis .
1.1.1 3D walls
Starting from the input data, the present invention allows to create a 3D virtual environment that includes a number of 2walls", i.e. three-dimensional (3D) individual graphical objects representing a part of the available information. In each wall, the user can control various display parameters and can interact with the graphical interface in order to perform wall-specific analyses. Hereinafter some examples of 3D walls that the system 1 can display (which will be described more in detail in the following) are provided:
• Genomic Matrix, for displaying data matrixes;
· Chromosome Map, for displaying data in the context of a graphical representation of the chromosomes;
Object Map, for contextual representation of individual elements (e.g., patients, tissue samples, gene clusters) and their groupings;
· Group Map, for the representation of groups of elements with loss of information about individual elements .
In figure 2 is shown an example of a three-walls 3D virtual environment that can be displayed by the system 1.
1.1.2 Changing resolution
The system 1 allows to change the resolution of the 3D virtual environment to show different levels of detail as a consequence of both the overall volume of data, and of which portion of the environment is to be viewed; the dynamically changing resolution allows both an effective display in real time of very large dataset and detailed analysis of portions of the dataset. The selection of the resolution level can be done automatically by the system 1 or manually set by the user. 1.1.3 Scale of representation
In order to get the maximum advantage from the display- parameters, a user can set the reference range of the values to be represented by defining a minimum, a maximum and a center value to be considered as a reference (which is not necessarily the average value between the minimum and maximum) . A specific part of the spectrum of values can thus be better highlighted, simply by varying the three reference points and updating in real time the 3D representation .
1.1.4 Hierarchical structures
It is possible to define a set of groups (or clusters) that include the fields of the matrix (rows or columns, that is, the genes or samples) ; for example you can define a set of groups of rows , where each group contains one or more genes. This set of groups is complete (contains all genes or all the samples) and is exclusive (the same item cannot be in two different groups) . Furthermore, you can define a hierarchy of clusters: given a set of groups, each of these can in turn have sub-groups, thereby forming a tree whose leaves are the individual fields (genes or samples) . Each level of the hierarchy contains a complete and exclusive set of groups. The hierarchical structure is used in combination with the dynamically changing resolution to vary the displayed level of detail. For example an overall view of the genomic matrix can show only clusters of higher level; browsing and approaching to the individual groups can show elements of the lowest hierarchical level up to the single- cell of the data matrix.
1.1.5 Reference plane
The 3D virtual environment can include a reference
"splitting" plane, in correspondence with the reference value (central value) of a wall; this plan allows to better distinguish the positive values (above the reference value) from the negative ones. A user can control the level of transparency of the plan for optimal viewing of the positive/negative data.
1.1.6 Value flipping
Since both positive values and negative values are represented in the virtual 3D environment, drawn in a different way with respect to a reference plane that corresponds to the central value, it may be necessary sometimes to also see elements that are located below the splitting plane. Instead of navigating and move down to see them, the user can simply reverse the representation of positive and negative values, thus moving the positive ones under the splitting plane and vice versa. This inversion option, or flip, is immediate and therefore faster than explicit surfing.
It is also possible to show the absolute value of each data item, in order to bring all the data in the positive range (and thus above the reference plane) . Even in this case it is a simple option of the user interface with immediate effect and that does not imply a significant response time neither it requires complex actions.
1.2 Display and analysis functionalities associated with the selection of elements
The system 1 allows to perform statistical analysis on the represented data. The capability to select one or more elements, together with the possibility to select which operation to apply to the selection, offer a number of options to the user to analyze the displayed values . The main options are summarized below.
· Selection of elements/objects/groups in a wall and display of items associated with them on other walls. You can for example select a group of genes in a map of gene groups and display data relating to those genes in the samples represented in another wall.
· Capability to activate the selection of elements by thresholds or logical operators.
• The selection can activate calculation functions whose results are displayed in real time on other walls. Below are two examples of applications which explain, but do not limit, the possibilities of using a three- dimensional multiple-walled visualization for integrative multi -parametric analysis.
a. You can have two walls with groups of genes in which the genes are grouped according to different criteria. By selecting a group of genes in a wall an enrichment calculation can be activated in the other wall, according to which the system 1 displays in which groups of this wall the presence of genes among those selected is significantly lower or higher than that expected by that case. The activated calculation in this case will be a hypergeometric function that establishes the probability of random representation of the selected objects, and an observed/expected ration that indicates how much the observed frequency is higher or lower compared to a random distribution of selected objects.
b. You can have a wall of groups of genes and a wall of groups of samples/patients. Selecting a group of genes, the average expression of the selected genes can be displayed, for each group, on the wall of the groups of samples/patients. Likewise, the result of any other function applied to the data obtained for the genes selected, can be displayed in the samples of the represented groups.
In summary, exploiting the three-walls layout (each of which may be of each of the four types mentioned in section 1.1.1) it is possible to select groups of genes or of samples and display in real time the results of functions applied to those genes in other contexts, such as groups of samples or other groups of genes, and so on.
• Possibility of working in "differential selection" mode, ' where there are two distinct groups of selected elements and where the calculations are carried out by comparing these two groups. You can, for example, select two groups of samples and display, in another wall, the genes that exhibit significant differences (for example of expression) between the two groups of selected samples.
Capability to define the function of interest invoked by the selection using a drop-down menu that appears as soon as you select a group of genes or objects.
Capability to be interfaced in input/output with external programs, including open-source ones, for maximum flexibility in data analysis. Using standard file formats, data are exported to the external program, in this external program the data are analyzed, and then re-imported for display in real time. This will open the system 1 to future developments that do not require a change in the internal base architecture. It is, for example, possible to interface the system 1 to the R programming language, exporting data and generating command strings that instruct the external program, written in R language, to acquire data, process them and then supply them back to the system 1 for the display, through a process that is transparent to the end user. In this way, even a user who does not know the R programming language will benefit directly, through the use of the system 1, of the analytical resources provided by programs written in R.
2. General data architecture
From the data model point of view, the system 1 internally adopts a proprietary model optimized for 3D visualization and interaction, but it can be interfaced to existing databases via standard formats and conversion routines. 'Below are listed the main features of the system 1 with regard to data management.
• Public data interface. It is foreseen that the system 1 could manage a data input structured according to a series of specifications currently in use in genomic data banks (GEO, TCGA, ICGC, etc.). The GUI will also allow non-technical users to manage data acquisition and analysis of publicly available dataset.
· Non-public data interface. It is foreseen that the system 1 could also manage, through the addition of dedicated extensions, a data input structured in a specific way and non compatible with the public data banks.
• Data input conversion in a proprietary data structure optimized to ensure a smooth, updated, real-time display of the entire 3D virtual environment.
• Graphical selection used as a filter. You can use the system 1 directly in a graphical and intuitive way to select a subset of the data and limit the visualization and analysis to these selected data. The system 1 can also generate output files containing only the elements of interest .
The graphical user interface can be used to reorganize the data and the clusters. Often the use of mathematical algorithms for the generation of clusters does not give satisfactory results. The user can then, by manually selecting and displacing the elements of interest, generate groups with real time control of their homogeneity level. The dataset reorganized in this way can be directly used for analysis or exported as files for archiving. 3. Detailed description of walls
3.1 Genomic Matrix
The Genomic Matrix includes a set of tridimensional objects that represent the values of the data matrix. Every single matrix cell, identified by a row index and a column index, corresponds to a base 3D element, represented as a parallelepiped which position depends directly on the row and column indices. The base of each parallelepiped is fixed and it does not change; the height instead is dynamic and it is linked to the value of that matrix cell .
A second data matrix, homogeneous to the first one for number of rows and columns, is represented by linking its values to the colors of each parallelepiped. This way every single 3D object carries both topological information (its position identifies the row and the column in the matrix, so the gene it refers to and the sample it belongs to) and dynamic information (the value of two different measurements related to that gene in that sample) .
Figures 3-5 show some examples of genomic matrices at different levels of details that can be displayed by system 1. In particular, the figures 3-5 show, respectively, a first genomic matrix in which the higher-level clusters are displayed, a second genomic matrix in which some lower- level clusters and some base elements are displayed, and a third genomic matrix in which the single items are displayed.
3.1.1 Spatial layout of objects in the Genomic Matrix
The entire data matrix is considered as a rectangle laid out horizontally, which size is automatically computed based on the number of rows and columns in the matrix itself. Every base element (every parallelepiped) is linked to the matrix cell it refers to and its position is uniquely defined in the rectangle that represents the entire data matrix.
The genomic matrix can be associated to a hierarchical clustering structure for the rows (the genes) and to one for the columns (the samples) . The obtained clusters are represented as bigger parallelepiped, which size equal the sum of their elements. For example, a group of ten rows will be displayed as a parallelepiped ten times bigger than the base one and its position will be defined so to include all its elements.
Note that the order by which rows and columns are placed on the base rectangle may be defined by the cluster hierarchy. If a column clustering hierarchy is present, columns will be organized so that, moving from one hierarchy level to the next one, the position will not change (a parallelepiped therefore is decomposed in its elements but their footprint on the base rectangle does not vary) ; if no column clustering is present, columns are organized according to the data loading order.
Concerning rows instead, if a row clustering hierarchy is present, the same principle used for columns is applied; if, instead, no explicit clustering is indicated, rows are clustered using a predefined group hierarchy that corresponds to the Chromosome Map (on this matter, see the next paragraph 3.2).
3.2 Chromosome Map
The Chromosome Map provides a visual data organization based on the traditional representation of the genome with its base elements, which are chromosomes, arms, bands and sub-bands.
In this regard, Figure 6 shows a standard Chromosome Map and Figure 7 shows a traditional graphical representation of Chromosome 12, in which are displayed the short arm (on the left) and the long arm (on the right) , the respective bands (integers) and the respective sub- bands (decimal digits) .
This representation is very familiar to citogeneticists and to anyone who operates in the biomedical domain, therefore the display and analysis results for a chromosome map are immediately readable in terms of chromosomal domains. The representation of genes therefore strictly follows their location along the related chromosome, so that zones that are free from genes or other data are not compressed but maintain their size and the overall geography remains stable.
3 . 3 Obj ect Map
The Object Map is a wall that provides a 3D representation of one of the dimensions of the data matrix, that is of the rows or of the columns, with the related hierarchy of groups if present. In this map, thus, the single matrix cells are not visible anymore, but instead the fields of one of the two dimensions are displayed. The generated 3D model reproduces in a clear way the hierarchical structure and allows to precisely identify and select any single element, be it an object (for example a gene) or a group at any hierarchy level.
The Object Map is used for the analysis of single objects or of groups of objects, as it allows the simultaneous visualization of values related to single objects and of metadata computed on groups of objects, eventually at different levels in the hierarchy. Besides, the Object Map directly influences the other walls included in the 3D environment, as the selection of one or more elements determines which calculations must be executed and on which data (as described in paragraph 1.2 about analysis features) . For example, it is possible that by means of processing the selection and the analysis of an element in a first Object Map representing active genes impact on a second Object Map showing samples, in particular bringing to compute and assign new values to the elements in said second Object Map.
A user can control which statistical operation (for example average, standard deviation, p-value, etc..) must be used in the calculations that determine the metadata of another wall.
Figure 8 shows an example Object Map, in particular an example of a 3D wall with Object Map that can be displayed by system 1.
3.3.1 Spatial layout of objects in the Object Map
The wall with an Object Map appears as a pyramidal structure, in which every layer corresponds to a level in the hierarchy of groups and when the higher level shows the single elements. Every object (the single gene or the single sample) is represented as a parallelepiped with the base having a predefined size.
At any hierarchy level, every group is represented as a parallelepiped with a base which dimension is defined by the set of bases of the contained elements, while the position is assigned by means of a strategy derived from the so-called "rectangle packing" algorithms. In this way, the surface occupied by every group is the least possible one needed to include all its elements.
Besides, every element is laid out above the parallelepiped that represents its parent group. As every hierarchical level has an assigned height, the resulting 3D wall has a base which represents the hierarchy root (which contains all the elements) and the higher layer shows the parallelepipeds of the single objects (genes or samples) .
3.4 Group Map
Single genes (corresponding to the rows in the data matrix) , are often linked to additional information that consists of relations among groups of genes (for example the miRNA target genes) ; these are non-exclusive groups (which means the same gene can be in more than one group) . The Group Map generates a wall on which these groups are represented.
Every group is displayed as a square with variable color, size and transparency. These parameters are used to represent group information as follows:
• the color is computed based on an intrinsic group value (static, defined as a parameter of the group itself) , or based on the active selection on samples; the computation used for the color is equivalent to the one described for matrix data in paragraph 4.1 about color assignment ;
• size is computed dynamically based on what genes are currently selected and it represents the ratio among observed events and expected events for the active selection;
• transparency is computed dynamically based on the selected genes and it represents the statistical significance of the selected population.
For further details about the computations and the statistical operations used by the system 1, see the following section 4 (and related paragraphs) concerning the graphical metaphors .
4. Detailed definition of the graphical metaphors
The system 1, in use, builds a graphical virtual environment with well-defined display rules so that, after a learning period, the association between given graphical elements and the variables represented by them is intuitive and immediately understandable . In this way even a very complex set of data can be quickly and efficiently displayed and analyzed. The base graphical elements are simple shapes, squared (square, rectangle, parallelepiped) or rounded (circle, cylinder) , which characteristics may simultaneously encode different types of data as specified here below.
4.1 Color assignment
The color of a graphical element is defined using a range of reference values and comparing the value to be represented with that range. The range is defined by a minimum value, a specific reference values (called for simplicity "central value") and a maximum value. If the value to be represented is greater than the central value, the color will be computed as an interpolation between the central color and the maximum color; if, instead, the value is less than the central value, the interpolation is done among the central color and the minimum color. A user can set both the reference values and the corresponding colors for the minimum, the maximum and the center of the range; alternatively, predefined colors and reference values can be used (for example, dark grey for the center, red for values greater than the center, green for values less than the center, with values between -1 and +1 and centered at 0) .
To better describe the color assignment, a practical example is described here below. Suppose that the 1.5 value is to be represented and that the range has been set as follows: • minimum value equal to -3 and represented by green color (expressed in RGB format as 0, 255, 0) ;
• central value equal to 0 and represented by black color (expressed in RGB format as 0, 0, 0) ;
· maximum value equal to +3 and represented by red color (expressed in RGB format as 255, 0, 0).
The 1.5 value is in the positive range, between the central value and the maximum value, more precisely in this case it is exactly at the half of the positive sub-range. The color representing 1.5 will then be equal to half of the maximum color, that is equal, in RGB format, to 127, 0, 0.
Note also that the definition of the range of values clamps any value that is outside that range; in the example described above, a value of -3.5 is considered as -3 and, therefore, it corresponds to the color assigned to the minimum value, which is green.
4.2 Size properties
The height of an element is directly defined by the value it represents; a user has a control element to edit the amplification of such value, so to augment or reduce as desired the global aspect.
Width instead has different meanings depending on the tridimensional wall. • For the objects in the Genomic Matrix, the base width directly indicates the number of included matrix cells. For the smallest parallelepipeds it is a single cell, for the cluster ones it is a higher number (indicating how many elements are contained in the cluster) .
For the Chromosome Map, the single genes are displayed as transversal lines/bands in the context of the chromosome they belong to, compatibly with what the display resolution allows to visualize at a given level of zoom. Bands can raise and generate parallelepipeds which color and height represents different measurements done on the same gene; for example, typically color is used to represent gene expression and height to represent the copy number for the given gene .
· For the Object Map, as for the Chromosome Map, the base of each parallelepiped is directly proportional to the number of contained objects.
• For the Group Map, height has no precise meaning because the single elements are represented as flat squares. The element width is instead a direct measurement of a dynamic value of the object (normally the "observed events/expected events" ratio for that group, but other functions can be chosen and applied) . 4.3 Particular graphical effects The system 1 can conveniently use a series of particular graphical effects to represent status information of an element.
• A blinking color helps highlighting an active or selected object. For example, on the walls displaying an
Object Map the system 1 highlights the selected objects with a semi-transparent box with a particular color that blinks. It is also possible to assign different colors to different selection types (for example, in the case of a differential selection, items in the primary selection are highlighted with a color that is different from the one used for those in the secondary selection) .
• System 1 can also conveniently use transparency effects. In particular, if a color is used to display an object value, its transparency indicates the statistical significance, where meaningful. Objects that are more transparent and, thus, less visible, indicate less statistically significant values, while more opaque and solid objects highlight the most relevant values.
· Finally system 1 can conveniently use also graphical effects associated to border color and thickness of the displayed elements. For example, for elements with no height (as in the case of the Group Map) it is possible to associate a symbolic meaning also to the object border, using both color and thickness as dynamic values. Typically this kind of graphical metaphor is used by system 1 to represent values with coarse dynamic variations, preferable of "none/all" kind, or distributed over a limited number of categories, as for examples those clinical variables associated to patients the data samples derive from (response/non- response to a given drug, recurrence/non- recurrence of illness, etc..) . Border thickness can also represent in a very effective way the statistical significance of a value, for which the p-value will be inversely proportional to the thickness. In this case, whatever the border color is, if the value is not relevant the border will be very thin and therefore not visible.
5. Final Remarks
The advantages of this invention can be immediately understood from the previous description.
In particular, it is important to emphasize the fact that the system for biomolecular data display and analysis as defined by this invention overcomes the technical limitations associated with the currently known systems because it allows to visualize great amounts of biomolecular data, in particular genomic data, in a way that is easily readable for a user, to analyze and process great amounts of biomolecular data, in particular genomic data, in an interactive and extremely fast way, and to display the analysis and processing results in real time and in a way that is easily readable for a user.
Besides, the system defined by this invention, thanks to the graphical functionalities described above, allows to simultaneously display heterogeneous data (that is, data of different types) for the same objects, in general thanks to the use of different graphical metaphors and, in particular, thanks to the use of different colors, multiple dimensions and different particular graphical effects. In this way, the system defined by this invention makes dramatically faster and more efficient for a user the process of integrative analysis of the data and of the processing results.
Finally, it is clear that numerous modifications and variants can be made to the present invention, all falling within the scope of protection of the present invention, as defined in the appended claims.

Claims

1. Electronic system (1) for biomolecular data display and analysis, configured to:
• display a tridimensional virtual environment including one or more tridimensional walls, wherein each displayed tridimensional wall includes respective tridimensional graphical elements and represents biomolecular data by means of the respective tridimensional graphical elements;
allow a user to
surf the displayed tridimensional virtual environment,
modify the display resolution of the displayed tridimensional virtual environment, select one or more tridimensional graphical elements of one or more displayed tridimensional walls, and
activate one or more predefined analysis and/or processing functions for one or more selected tridimensional graphical elements;
• modify interactively and in real time the displayed tridimensional virtual environment depending on the user' s surfing and in response to a user's modification of the display resolution; and
• in response to the user' s activation of one of more predefined analysis and/or processing functions for one or more selected tridimensional graphical elements of one or more displayed tridimensional walls,
apply said predefined analysis and/or processing function (s) to the biomolecular data represented by the selected tridimensional graphical element (s), and
modify interactively and in real time the selected tridimensional graphical element (s) depending on one or more results obtained by applying said predefined analysis and/or processing function (s) .
2. The electronic system of claim 1, wherein the tridimensional graphical elements of a displayed tridimensional wall are spatially arranged according to a given spatial arrangement and are grouped according to one or more grouping criteria;
wherein a displayed tridimensional wall represents the biomolecular data also by means of the spatial arrangement and of the grouping (s) of the respective tridimensional graphical elements;
and wherein the electronic system (1) is further configured to:
allow a user to modify the spatial arrangement and/or the grouping criterion/a of one or more tridimensional graphical elements of a displayed tridimensional wall; and,
• in response to a user' s modification of the spatial arrangement and/or of the grouping criterion/a of one or more tridimensional graphical elements of a displayed tridimensional wall, modify interactively and in real time the spatial arrangement and/or the grouping (s) of said tridimensional graphical element (s) in said displayed tridimensional wall.
3. The electronic system according to claim 1 or 2 , wherein a displayed tridimensional wall includes several layers of tridimensional graphical elements arranged according to one or more layer hierarchies and represents the biomolecular data at different levels of detail by means of the different layers of tridimensional graphical elements ;
and wherein the electronic system (1) is further configured to:
• allow a user to modify the layer hierarchy/ies for a displayed tridimensional wall; and,
in response to a user's modification of the layer hierarchy/ies for a displayed tridimensional wall, modify interactively and in real time the layers of tridimensional graphical elements of said displayed tridimensional wall.
4. The electronic system according to any preceding claim, wherein the displayed tridimensional virtual environment includes:
• a first tridimensional wall that represents a first set of biomolecular data by means of first tridimensional graphical elements; and
• a second tridimensional wall that represents a second set of biomolecular data related to the first set of biomolecular data, said second tridimensional wall representing said second set of biomolecular data by means of second tridimensional graphical elements related to the first tridimensional graphical elements;
and wherein the electronic system (1) is further configured to, in response to the user's activation of one or more predefined analysis and/or processing functions for one or more selected first tridimensional graphical elements, modify interactively and in real time the second tridimensional graphical elements related to said selected first tridimensional graphical elements depending on one or more results obtained by applying said predefined analysis and/or processing function(s).
5. The electronic system according to any preceding claim, wherein the displayed tridimensional virtual environment includes :
• a first tridimensional wall representing a first set of biomolecular data by means of first tridimensional graphical elements; and
• a second tridimensional wall representing a second set of biomolecular data related to the first set of biomolecular data, said second tridimensional wall representing said second set of biomolecular data by means of second tridimensional graphical elements related to the first tridimensional graphical elements;
and wherein the electronic system (1) is further configured to, in response to the user's selection of one or more first tridimensional graphical elements, highlight interactively and in real time the second tridimensional graphical elements related to said selected first tridimensional graphical elements.
6. The electronic system according to any preceding claim, wherein a displayed tridimensional wall represents the biomolecular data also by means of graphical attributes of the respective tridimensional graphical elements.
7. The electronic system of claim 6, further configured to graphically render at least a specific graphical attribute of the tridimensional graphical elements of a displayed tridimensional wall based on values of the biomolecular data to be represented, on one or more ranges of admissible values for the values of the biomolecular data to be represented, on one or more reference values for the values of the biomolecular data to be represented, and on graphical rendering parameters of said specific graphical attribute associated with said range (s) of admissible values and with said reference value (s) .
8. The electronic system of claim 7, further configured to allow a user to set up and/or modify:
• the range (s) of admissible values for the values of the biomolecular data to be represented;
the reference value (s) for the values of the biomolecular data to be represented; and
• the graphical rendering parameters of the specific graphical attribute associated with said range (s) of admissible values and with said reference value (s).
9. The electronic system according to claim 7 or 8 , wherein the specific graphical attribute is the color of the tridimensional graphical elements.
10. The electronic system according to any claim 6-9, wherein the graphical attributes of a tridimensional graphical element of a displayed tridimensional wall include the color and the size of said tridimensional graphical element.
11. The electronic system according to any preceding claim, wherein the predefined analysis and/or processing function (s) is/are predefined statistical analysis and/or statistical processing function(s).
12. The electronic system according to any preceding claim, further configured to allow a user to set up and/or modify the predefined analysis and/or processing function (s) .
13. The electronic system according to any preceding claim, wherein a displayed tridimensional wall represents:
• a genomic matrix by means of respective tridimensional graphical elements, each of which represents one or more values measured in a respective sample with respect to a respective gene; or
a chromosome map by means of respective tridimensional graphical elements representing the genome chromosomes, the chromosomes' arms, the arms' bands and the bands' sub-bands; or
· an object map by means of respective tridimensional graphical elements representing values of specific biomolecular data and of groupings of said specific biomolecular data; or
• a group map by means of respective tridimensional graphical elements representing values of groups of biomolecular data and information items associated with said groups .
14. Computer product including software code portions that are :
· executable by computing and processing means (15) of an electronic system (1) that includes display means (12) and user input means (13) ; and
such that to cause, when executed, said electronic system (1) to become configured as the electronic system for biomolecular data display and analysis claimed in any preceding claim.
PCT/IB2012/056598 2011-11-21 2012-11-21 Interactive system for display and analysis of biomolecular data WO2013076665A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP11425282.8 2011-11-21
EP11425282 2011-11-21

Publications (1)

Publication Number Publication Date
WO2013076665A1 true WO2013076665A1 (en) 2013-05-30

Family

ID=47522745

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2012/056598 WO2013076665A1 (en) 2011-11-21 2012-11-21 Interactive system for display and analysis of biomolecular data

Country Status (1)

Country Link
WO (1) WO2013076665A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307218A1 (en) * 2005-05-16 2009-12-10 Roger Selly Associative memory and data searching system and method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090307218A1 (en) * 2005-05-16 2009-12-10 Roger Selly Associative memory and data searching system and method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Pacific Symposium on Biocomputing 2011", 1 November 2010, WORLD SCIENTIFIC, ISBN: 978-9-81-433505-8, article JASON H. MOORE ET AL: "HUMAN MICROBIOME VISUALIZATION USING 3D TECHNOLOGY", pages: 154 - 164, XP055056097, DOI: 10.1142/9789814335058_0017 *
LEISHI ZHANG ET AL: "3D Visualization of Gene Clusters and Networks", VISUALIZATION AND DATA ANALYSIS 2005, 1 January 2005 (2005-01-01), pages 316 - 326, XP055056162, Retrieved from the Internet <URL:http://spiedigitallibrary.org/data/Conferences/SPIEP/25904/316_1.pdf> [retrieved on 20130312] *
MICHAEL SCHROEDER ET AL: "Approaches to visualisation in bioinformatics: from dendrograms to Space Explorer", INFORMATION SCIENCES, vol. 139, no. 1-2, 1 November 2001 (2001-11-01), pages 19 - 57, XP055056155, ISSN: 0020-0255, DOI: 10.1016/S0020-0255(01)00156-6 *
Y. YANG ET AL: "Integration of metabolic networks and gene expression in virtual reality", BIOINFORMATICS, vol. 21, no. 18, 15 September 2005 (2005-09-15), pages 3645 - 3650, XP055056154, ISSN: 1367-4803, DOI: 10.1093/bioinformatics/bti581 *

Similar Documents

Publication Publication Date Title
Kruse et al. FAN-C: a feature-rich framework for the analysis and visualisation of chromosome conformation capture data
Andreopoulos et al. A roadmap of clustering algorithms: finding a match for a biomedical application
Rajaram et al. NeatMap-non-clustering heat map alternatives in R
Santamaría et al. A visual analytics approach for understanding biclustering results from microarray data
Tao et al. Information visualization techniques in bioinformatics during the postgenomic era
WO2018006073A1 (en) Scalable topological data analysis
Brandes et al. Visual understanding of metabolic pathways across organisms using layout in two and a half dimensions
Meyer et al. Visualization of data
Samal et al. Geometric analysis of pathways dynamics: application to versatility of TGF-β receptors
Rahnavard et al. Omics community detection using multi-resolution clustering
Mougin et al. Visualizing omics and clinical data: Which challenges for dealing with their variety?
Moses et al. Voyager: exploratory single-cell genomics data analysis with geospatial statistics
Filippova et al. Coral: an integrated suite of visualizations for comparing clusterings
Dresen et al. Software packages for quantitative microarray-based gene expression analysis
Cvek et al. Multidimensional visualization tools for analysis of expression data
WO2013076665A1 (en) Interactive system for display and analysis of biomolecular data
Qu et al. Enhancing understandability of omics data with shap, embedding projections and interactive visualisations
Aouabed et al. VisBicluster: A Matrix-Based bicluster visualization of expression data
Aouabed et al. Visualizing biclustering results on gene expression data: A survey
Aouabed et al. An evaluation study of biclusters visualization techniques of gene expression data
Lavikka Grammar-Based Interactive Genome Visualization
Aouabed et al. Suitable overlapping set visualization techniques and their application to visualize biclustering results on gene expression data
Fang et al. A topology-preserving selection and clustering approach to multidimensional biological data
Wang et al. Deep learning integrates histopathology and proteogenomics at a pan-cancer level
US20210248792A1 (en) Systems and Methods for Blending and Aggregating Multiple Related Datasets and Rapidly Generating a User-Directed Series of Interactive 3D Visualizations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12813112

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12813112

Country of ref document: EP

Kind code of ref document: A1